A Categorial Variation Database for English
نویسندگان
چکیده
We describe our approach to the construction and evaluation of a large-scale database called “CatVar” which contains categorial variations of English lexemes. Due to the prevalence of cross-language categorial variation in multilingual applications, our categorial-variation resource may serve as an integral part of a diverse range of natural language applications. Thus, the research reported herein overlaps heavily with that of the machine-translation, lexicon-construction, and information-retrieval communities. We apply the information-retrieval metrics of precision and recall to evaluate the accuracy and coverage of our database with respect to a human-produced gold standard. This evaluation reveals that the categorial database achieves a high degree of precision and recall. Additionally, we demonstrate that the database improves on the linkability of Porter stemmer by over 30%.
منابع مشابه
System Demonstration CatVar: A Database of Categorial Variations for English
We present a new large-scale database called “CatVar” (Habash and Dorr, 2003) which contains categorial variations of English lexemes. Due to the prevalence of cross-language categorial variation in multilingual applications, our categorial-variation resource may serve as an integral part of a diverse range of natural language applications. Thus, the research reported herein overlaps heavily wi...
متن کاملGenerating Context Appropriate Word Orders in Turkish
Turkish, like Finnish, German, Hindi, Japanese, and Korean, has considerably freer word order than English. In these languages, word order variation is used to convey distinctions in meaning that are not generally captured in the semantic representations that have been developed for English, although these distinctions are also present-in somewhat less obvious ways in English. In the next secti...
متن کاملExtending CCG-based Syntactic Constraints in Hierarchical Phrase-Based SMT
In this paper, we describe two approaches to extending syntactic constraints in the Hierarchical Phrase-Based (HPB) Statistical Machine Translation (SMT) model using Combinatory Categorial Grammar (CCG). These extensions target the limitations of previous syntax-augmented HPB SMT systems which limit the coverage of the syntactic constraints applied. We present experiments on Arabic–English and ...
متن کاملDistinguishing Phenogrammar from Tectogrammar Simplifies the Analysis of Interrogatives
Oehrle (1994) introduced a categorial grammar architecture in which word order is represented using the terms of a typed λ-calculus and the syntactic type system is based on linear logic. In this paper, we use a variant of this architecture, similar to λ-grammar (Muskens 2003, 2007b), to analyze interrogatives in English and Chinese. We show that separating word order (phenogrammar) and syntact...
متن کاملCreating a Natural Logic Inference System with Combinatory Categorial Grammar
This dissertation presents an integrated system for producing Natural Logic inferences, which are used in a wide variety of natural language understanding tasks. Natural Logic is the process of creating valid inferences by making incremental edits to natural language expressions with respect to a universal monotonicity calculus, without resorting to logical representation of the expressions (us...
متن کامل